智能论文笔记

Patch-based Object-centric Transformers for Efficient Video Generation

Wilson Yan , Ryo Okumura , Stephen James , Pieter Abbeel

分类：计算机视觉 | 机器学习

2022-06-08

在这项工作中，我们介绍了基于补丁的以对象为中心的视频变压器（POVT），这是一种基于区域的新型视频生成体系结构，利用以对象为中心的信息来有效地对视频中的时间动态进行建模。我们在视频预测中通过自回旋变压器在压缩视频的离散潜在空间中进行了先前的工作，并通过边界框进行了更改，以增加对象以对象为中心的信息。由于以对象为中心表示的更好的可压缩性，我们可以通过允许模型仅访问对象信息以获取更长的视野时间信息来提高训练效率。当对以对象为中心的各种困难数据集进行评估时，我们的方法可与其他视频生成模型更好或相等的性能，同时在计算上更有效和可扩展。此外，我们表明我们的方法能够通过边界框操作执行以对象为中心的可控性，这可能有助于下游任务，例如视频编辑或视觉计划。示例可在https://sites.google.com/view/povt-public} {https://sites.google.com/view/povt-public获取

translated by 谷歌翻译

Tactile-Sensitive NewtonianVAE for High-Accuracy Industrial Connector Insertion

Ryo Okumura , Nobuki Nishio , Tadahiro Taniguchi

分类：机器人 | 人工智能

2022-03-10

工业连接器插入任务需要亚毫米定位并掌握插头的姿势补偿。因此，对插头和插座之间的相对姿势的高度准确估计对于完成任务至关重要。世界模型是视觉运动控制的有前途的技术，因为它们获得了适当的状态表示，以共同优化特征提取和潜在动力学模型。最近的研究表明，Newtonianvae是一种世界模型的一种类型，可获得等同于从图像到物理坐标的映射的潜在空间。在牛顿维尔的潜在空间中可以实现比例控制。但是，在物理环境中应用牛顿台上的牛顿工业任务是一个开放的问题。此外，现有的框架不考虑在获得的潜在空间中的掌握姿势补偿。在这项工作中，我们提出了对触觉敏感的Newtonianvae，并将其应用于物理环境中带有姿势变化的USB连接器插入。我们采用了凝胶型触觉传感器，并估计了插头的掌握姿势补偿的插入位置。我们的方法以端到端的方式训练潜在空间，不需要其他工程和注释。在获得的潜在空间中可以使用简单的比例控制。此外，我们证明了原始的牛顿病在某些情况下失败了，并证明了域知识诱导可以提高模型的准确性。可以使用机器人规范和掌握姿势误差测量轻松获得此域知识。我们证明了我们提出的方法在物理环境中的USB连接器插入任务中实现了100 \％的成功率和0.3 mm的定位精度。它优于SOTA CNN的两阶段目标姿势回归，并使用坐标转换掌握了姿势补偿。

translated by 谷歌翻译

Attention in a family of Boltzmann machines emerging from modern Hopfield networks

Toshihiro Ota , Ryo Karakida

分类：机器学习 | 神经与进化计算 | (统计)机器学习

2022-12-09

Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broaden the class of energy functions and led to a unified perspective on general Hopfield networks including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions, and study their salient properties from a trainability perspective. In particular, the energy function corresponding to the attention module naturally introduces a novel BM, which we refer to as attentional BM (AttnBM). We verify that AttnBM has a tractable likelihood function and gradient for a special case and is easy to train. Moreover, we reveal the hidden connections between AttnBM and some single-layer models, namely the Gaussian--Bernoulli restricted BM and denoising autoencoder with softmax units. We also investigate BMs introduced by other energy functions, and in particular, observe that the energy function of dense associative memory models gives BMs belonging to Exponential Family Harmoniums.

translated by 谷歌翻译

Counterfactual Learning with General Data-generating Policies

Yusuke Narita , Kyohei Okumura , Akihiro Shimizu , Kohei Yata

分类：机器学习 | 人工智能 | (统计)机器学习

2022-12-04

Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We extend its applicability by developing an OPE method for a class of both full support and deficient support logging policies in contextual-bandit settings. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy.

translated by 谷歌翻译

Composition, Attention, or Both?

Ryo Yoshida , Yohei Oseki

分类：自然语言处理

2022-10-24

In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.

translated by 谷歌翻译

Black-box optimization for integer-variable problems using Ising machines and factorization machines

Yuya Seki , Ryo Tamura , Shu Tanaka

分类：机器学习

2022-09-01

黑盒优化在许多应用中具有潜力，例如在实验设计中的机器学习和优化中的超参数优化。 ISING机器对二进制优化问题很有用，因为变量可以由Ising机器的单个二进制变量表示。但是，使用ISING机器的常规方法无法处理具有非二进制值的黑框优化问题。为了克服这一限制，我们通过与三种不同的整数编码方法合作，通过使用ISING/退火计算机和分解计算机来提出一种用于整数变量的黑盒优化问题的方法。使用不同的编码方法，使用一个简单的问题来计算最稳定状态下的氢分子能量，以不同的编码方法进行数值评估。提出的方法可以使用任何整数编码方法来计算能量。但是，单次编码对于小尺寸的问题很有用。

translated by 谷歌翻译

Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

Yidong Wang , Hao Wu , Ao Liu , Wenxin Hou , Zhen Wu , Jindong Wang , Takahiro Shinozaki , Manabu Okumura , Yue Zhang

分类：自然语言处理

2022-08-17

面向目标的意见单词提取（TOWE）是一项精细的情感分析任务，旨在从句子中提取给定意见目标的相应意见单词。最近，深度学习方法在这项任务上取得了显着进步。然而，由于昂贵的数据注释过程，TOWE任务仍然遭受培训数据的稀缺性。有限的标记数据增加了测试数据和培训数据之间分配变化的风险。在本文中，我们建议利用大量未标记的数据来通过增加模型对变化分布变化的暴露来降低风险。具体而言，我们提出了一种新型的多透明一致性正则化（MGCR）方法，以利用未标记的数据并设计两个专门用于TOWE的过滤器，以在不同的粒度上过滤嘈杂的数据。四个TOWE基准数据集的广泛实验结果表明，与当前的最新方法相比，MGCR的优越性。深入分析还证明了不同粒度过滤器的有效性。我们的代码可在https://github.com/towessl/towessl上找到。

translated by 谷歌翻译

RealityTalk: Real-Time Speech-Driven Augmented Presentation for AR Live Storytelling

Jian Liao , Adnan Karim , Shivesh Jadon , Rubaiat Habib Kazi , Ryo Suzuki

分类：自然语言处理

2022-08-12

我们介绍RealityTalk，该系统通过语音驱动的互动虚拟元素来增强实时实时演示。增强演示文稿利用嵌入式视觉效果和动画来吸引和表现力。但是，现有的实时演示工具通常缺乏互动性和即兴创作，同时在视频编辑工具中产生这种效果需要大量的时间和专业知识。RealityTalk使用户能够通过实时语音驱动的交互创建实时增强演示文稿。用户可以通过实时语音和支持方式进行交互提示，移动和操纵图形元素。根据我们对177个现有视频编辑的增强演示文稿的分析，我们提出了一套新颖的互动技术，然后将它们纳入真人秀。我们从主持人的角度评估我们的工具，以证明系统的有效性。

translated by 谷歌翻译

Sketched Reality: Sketching Bi-Directional Interactions Between Virtual and Physical Worlds with AR and Actuated Tangible UI

Hiroki Kaimoto , Kyzyl Monteiro , Mehrad Faridan , Jiatong Li , Samin Farajian , Yasuaki Kakehi , Ken Nakagaki , Ryo Suzuki

分类：机器人

2022-08-12

本文介绍了素描的现实，这种方法结合了AR素描和驱动的有形用户界面（TUI），用于双向素描交互。双向草图使虚拟草图和物理对象通过物理驱动和数字计算相互影响。在现有的AR素描中，虚拟世界和物理世界之间的关系只是一个方向 - 虽然物理互动会影响虚拟草图，但虚拟草图对物理对象或环境没有返回效果。相反，双向素描相互作用允许草图和驱动的tuis之间的无缝耦合。在本文中，我们采用桌面大小的小型机器人（Sony Toio）和基于iPad的AR素描工具来演示该概念。在我们的系统中，在iPad上绘制和模拟的虚拟草图（例如，线，墙壁，摆和弹簧）可以移动，动画，碰撞和约束物理Toio机器人，就像虚拟草图和物理对象存在于同一空间中一样通过AR和机器人运动之间的无缝耦合。本文贡献了一组新型的互动和双向AR素描的设计空间。我们展示了一系列潜在的应用，例如有形的物理教育，可探索的机制，儿童有形游戏以及通过素描的原位机器人编程。

translated by 谷歌翻译

Selective Self-Assembly using Re-Programmable Magnetic Pixels

Martin Nisser , Yashaswini Makaram , Faraz Faruqi , Ryo Suzuki , Stefanie Mueller

分类：机器人

2022-08-07

本文介绍了一种生成高度选择性编码的方法，这些编码可以在物理模块上磁性地“编程”，以使其能够以所选的配置自组装。我们基于Hadamard矩阵生成这些编码，并展示如何设计模块的面孔，以对其预期的伴侣具有最大吸引力，同时对其他面孔保持最大不可知论。我们得出这些界限的保证，并通过实验验证它们的吸引力和不可知论。使用其面部已被软磁性材料覆盖的立方模块，我们显示了如何使用带有平面面的廉价的被动模块来选择性地自组装成目标形状，而无需几何指南。我们表明，这些模块可以使用基于CNC的磁性绘图仪轻松地重新编程，以用于新的目标形状，并证明水箱中8个立方体的自组装。

translated by 谷歌翻译